-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add internal definitions for S3 ls recursive #4467
Conversation
This pull request has been linked to: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For as much feedback as I've left, I consider this PR in pretty good shape.
class VFSTest
needs to be made more compliant with C.41. It can't entirely yet because it needs to deal with "not supported" compile environments- There's some other smallish polishing items.
The most significant thing is going to be to change the callback type to a template argument. This is done all over <algorithm>
, for example. This will ease the next step of hooking this code up by wrapping the C callback that will come in through the API.
85ca26d
to
03bddbf
Compare
03bddbf
to
a9c3195
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We need to purge the word "callback" from most of this, along with its abbreviations. See specific comments.
- I believe we should be able to have this PR not use
#ifdef HAVE_S3
anywhere but in a few root places. It's too prevalent right now.
Previous review isn't complete. I pressed "send" too early. To be continued... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think all the
setup_test()
functions should go. If they can't become the constructor body, then we need to identify the problems with initialization and solve them properly.
I really hate the way github makes it so easy to send the review early. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a different interface for the function supplied as a selection predicate. The other issues are more about code structure and don't really affect this issue. Let's work on this one last.
Done for now. |
+ Docs
194b8e3
to
8da19b7
Compare
I addressed the remaining feedback and updated unit tests today.. As a final step I ran the VCF query in the original story and the 3202 results are collected but the iteration never ends, so there's an edge case here that needs fixed and coverage added. 🧐 |
Fixed the bug I ran into before the holidays and added test coverage. I created a story for follow up to inherit from FilesystemBase since it's pure virtual and would require implementing all the methods in this PR. I also noticed some of the methods in FilesystemBase don't exist for cloud backends and vice versa, so there will likely be more refactoring for FilesystemBase once it's being used for a cloud backend and not just local Posix. I added more detail to the story with some example code to show what I'm thinking there. It's not a ton of work but seems mostly out of scope for this PR. https://app.shortcut.com/tiledb-inc/story/38117 I'm not sure where ls_recursive fits into FilesystemBase since it's a template method that can't be virtual, so open to suggestions there. Otherwise all feedback has been addressed and this is ready for review 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- All the new tests are at the unit level. We don't want new unit tests in
tiledb_unit
(much of which is not unit tests).- They should appear in the test directory for the unit, which is
sm/filesystem/test
. - Test support code should go in the most local directory possible. In this case this is the same directory.
- After these changes, we shouldn't need to touch the
/test
directory at all. - There are two
class VFS
. Probably don't need two.
- They should appear in the test directory for the unit, which is
- See https://en.cppreference.com/w/cpp/named_req/InputIterator.
- The iterator doesn't need
begin()
orend()
. - Need an internal function
ensure_dereferenceable()
to throw whenptr_ == nullptr
.
- The iterator doesn't need
@@ -201,6 +201,7 @@ set(TILEDB_CORE_SOURCES | |||
${TILEDB_CORE_INCLUDE_DIR}/tiledb/sm/filesystem/uri.cc | |||
${TILEDB_CORE_INCLUDE_DIR}/tiledb/sm/filesystem/vfs.cc | |||
${TILEDB_CORE_INCLUDE_DIR}/tiledb/sm/filesystem/vfs_file_handle.cc | |||
${TILEDB_CORE_INCLUDE_DIR}/tiledb/sm/filesystem/ls_scanner.cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of enlarging TILEDB_CORE_SOURCES
, we should be able to add vfs
to TILEDB_CORE_OBJECTS_LIBS
instead. This will require removing some sources. If this doesn't just work without a hitch, we'll do it in a later PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing, I ran into a good bit of linkage errors here today so created SC-38601 for follow up.
I addressed all other feedback here but didn't quite make it to relocating tests. I created SC-38602 for relocating tests out of tiledb_unit, but if this PR is open after the holidays I'm happy to give it a shot here.
Damn github UI. Still reviewing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the previous comments:
- The
end_
member variable is fundamentally broken. We can't use a pointer for the purpose that it's being put to. This is pretty specific tobegin_
andend_
. Their details need to be changed, but that's it. They're not being used in any conceptually incorrect way.
But for code organization (previously submitted), and this new issue, everything looks pretty good. Overall code structure is definitely in a good state.
@eric-hughes-tiledb as long as the external API is finalized here, we need to wrap this up ASAP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good enough for now.
We'll need to change class LsScanIterator
to take an iterator type instead of an underlying data type when we generalize it past the current use only in S3. It works only because the iterator type declared (from vector
) happens to match the one in the AWS library. That won't be true in general.
@@ -142,7 +169,9 @@ class LsScanIterator { | |||
* @return Reference to this iterator after advancing to the next object. | |||
*/ | |||
LsScanIterator& operator++() { | |||
scanner_->next(ptr_); | |||
if (++ptr_ != pointer()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a double increment, since next()
is being called below. I'll let it pass for now since the unit tests pass, but this might be defective.
This adds internal definitions for
sm::VFS::ls_recursive
andS3::ls_filtered
for S3 only. Results are collected using aS3Scanner
class that manages fetching results from S3, and advancesLsScanIterator
until the next accepted result is reached. TheS3Scanner::iterator()
method returns an iterator for the scan which can be passed to STL constructors or algorithms using input iterators.TYPE: FEATURE
DESC: Add internal definitions for S3 ls recursive